Show the code
import pandas as pd
import numpy as np
from lets_plot import *
LetsPlot.setup_html(isolated_frame=True)import pandas as pd
import numpy as np
from lets_plot import *
LetsPlot.setup_html(isolated_frame=True)# Learn morea about Code Cells: https://quarto.org/docs/reference/cells/cells-jupyter.html
# Include and execute your code here
df = pd.read_csv("https://github.com/byuidatascience/data4names/raw/master/data-raw/names_year/names_year.csv")Mary, Martha, Peter, and Paul are all Christian names. From 1920 - 2000, compare the name usage of each of the four names in a single chart. What trends do you notice? You must provide a chart. The years labels on your charts should not include a comma.
There are a couple of things that i noticed with this chart. Martha, Paul & Peter have been used not as much as Mary but stayed very consistnent through the years, while Mary has a high in between 1945 and 1954, then drastically tank
# Q1
# Step 1: Sum across all state columns for total usage per row
df["Total_States"] = df.drop(columns=["name", "year", "Total"]).sum(axis=1)
# Step 2: Filter for the names and years
q1_names = ["Mary", "Martha", "Peter", "Paul"]
q1_df = df[(df["name"].isin(q1_names)) & (df["year"].between(1920, 2000))]
# Step 3: Plot the total count over time
ggplot(q1_df, aes(x="year", y="Total_States", color="name")) + \
geom_line() + \
ggtitle("Comparing the Names: Mary, Martha, Peter, and Paul (1920–2000)") + \
xlab("Year") + \
ylab("Total Count (All States Combined)") + \
scale_x_continuous(breaks=list(range(1920, 2001, 10)), format = "d")tThe movie Frozen shows that there is indeed of the usage of Elsa. in 2013 its roughly orver 2000 people, and then the name doubles close to 4,500 people.
# Q2
# Step 1: Sum across states
df["Total_States"] = df.drop(columns=["name", "year", "Total"]).sum(axis=1)
# Step 2: Filter for Elsa
q2_df = df[(df["name"] == "Elsa") & (df["year"].between(2000, 2020))]
# Step 3: Plot it!
ggplot(q2_df, aes(x="year", y="Total_States")) + \
geom_line(color="#800080") + \
geom_vline(xintercept=2013, linetype="dashed", color="red") + \
ggtitle("Popularity of 'Elsa' Over Time (2000–2020)") + \
xlab("Year") + \
ylab("Total Count") + \
scale_x_continuous(breaks=list(range(2000, 2021, 2)), format = "d")